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Issues  in  Information  Hiding  Transform 

Techniques* 


1  INTRODUCTION 

Information  hiding  has  emerged  as  an  exciting  and  important  research  field.  In¬ 
formation  hiding  not  only  complements  the  traditional  obfuscation  techniques, 
(e.g.,  [17])  but  also  brings  to  it  new  prospects.  By  its  definition,  information  hid¬ 
ing  hides  a  message  (the  embedded  message)  under  a  cover  message  to  yield  the 
ste#o-message.  Much  of  the  research  in  information  hiding  has  focused  upon 
steganography  and  watermarking.  Steganography  refers  to  methods  that  are 
used  to  transmit  the  embedded  message  without  an  observer  being  aware  that 
there  is  an  embedded  message  in  the  cover  message.  The  embedded  message  may 
be  fragile  -  it  is  easily  broken  in  the  face  of  attacks.  With  respect  to  steganog¬ 
raphy,  robustness  is  not  a  critical  property.  Transparency  is!  The  similar  field 
of  watermarking  is  to  embed  a  “watermark”  for  the  purpose  of  authentication, 
a  crucial  step  for  copyright  protection  and  tamper  proofing.  The  embedded 
watermark  may  not  be  transparent  in  the  sense  that  it  is  perceivable,  but  it 
must  not  be  easily  removed  from  the  stego  message.  The  embedded  watermark 
is  usually  required  to  be  semi-fragile  (i.e. ,  destroyed  if  changes  exceed  a  limit) 
or  robust.  Johnson  et.  al.  [8]  nicely  state  (their  concern  is  images)  that  “Tradi¬ 
tional  steganography  conceals  information;  watermarks  extend  information  and 
may  be  considered  attributes  of  the  cover  image.” 

In  our  present  experiments,  digital  images  are  used  as  the  cover  message 
in  which  we  embed  the  hidden  information.  Two  common  modes  of  embed¬ 
ding  are  spatial  embedding  and  transform  embedding.  Spatial  embedding 
inserts  messages  into  image  pixels,  usually  in  the  least  significant  bits1  (LSB)2 
[10].  LSB  embedding  has  the  merit  of  simplicity,  but  suffers  from  the  lack  of 
robustness.  LSB  embedding  is  susceptible  to  image-processing  type  of  attacks. 
Error-correction  coding  has  been  proposed  for  enhancing  the  robustness,  but 
its  effectiveness  is  limited  to  low  levels  of  noise  [9]  [13].  If  spatial  embedding  in¬ 
volves  higher  order  bits,  one  runs  the  very  real  risk  of  the  steganography  being 

*  Research  supported  by  the  Office  of  Naval  Research. 

1  Early  experiments  of  embedding  messages  under  the  least  significant  bits  in  audio 
steganography  were  performed  by  Kang  [9]. 

2 Abbreviations  may  be  singular  or  plural. 
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detected,  and  for  watermarking  the  concern  is  that  the  cover  image  might  be 
degraded  and/or  the  watermark  may  be  easy  to  remove.  In  order  to  achieve 
robust  hiding,  researchers  have  invoked  transform  domain  techniques  (e.g.,  fre¬ 
quency  space)  [5].  Transform  embedding  embeds  a  message  by  modifying 
(selected)  transform  (e.g.,  frequency)  coefficients  of  the  cover  message.  Ideally, 
transform  embedding  has  the  effect  in  the  spatial  domain  of  apportioning  the 
hidden  information  through  different  order  bits  in  a  manner  that  is  robust,  but 
yet  hard  to  detect.  Of  course  one  must  then  be  concerned  with  the  detectability 
in  the  frequency  domain,  but  at  least  the  human  visual  system  (HVS)  may  be 
fooled.  Therefore,  hiding  in  the  frequency  domain  presents  its  own  challenges 
(e.g.,  [5] [7]).  Since  an  attack,  such  as  image  processing,  usually  affects  a  certain 
band  of  transform  coefficients,  the  remaining  coefficients  would  remain  largely 
intact.  Hence,  transform  embedding  is  in  general  more  robust  than  spatial  em¬ 
bedding. 

Extraction  of  the  embedded  message  is  often  carried  out  by  comparing  the 
stego-message  with  the  cover  message.  This  is  practical  for  watermarking,  but 
one  may  not  have  the  original  cover  message  when  dealing  with  a  stego-message. 
Without  a  cover  image,  embedding  may  involve  a  stego-key.  The  stego-key 
would  serve  a  similar  purpose  as  the  cover  image  in  that  it  (hopefully)  enables 
us  to  determine  the  hidden  message.  Also  note  that  for  message  authentica¬ 
tion,  it  may  be  sufficient  only  to  prove  the  existence  of  the  embedded  message 
perhaps  via  a  similarity  measure  (e.g.,  [5]).  Also,  In  the  absence  of  the  original 
image,  statistical  methods  based  on  detection  probability  have  been  proposed 
for  extraction  (e.g.,  [20]). 

2  REVIEW 

In  this  section,  we  will  briefly  review  the  three  most  commonly  used  transform 
techniques:  DFT,  DCT  and  Wavelet. 

2.1  Discrete  Fourier  Transform:  DFT 

The  DFT  has  its  root  in  the  Fourier  series  analysis.  Recall  that  a  time  domain 
periodic  function  f(t)  can  be  decomposed  into  a  series  of  sine  (or  cosine)  wave 
functions,  where  each  has  frequency  that  is  a  multiple  of  a  constant  (i.e.,  the  1st 
harmonic  o;o)-3  The  goal  is  finding  the  coefficient  for  each  wave  function.  For 
the  purpose  of  frequency  domain  analysis,  the  exponential  Fourier  series  is  used 
in  places  for  sine  or  cosine  series  and  its  coefficient  of  the  nth  harmonic  (i.e., 
ruxJo)  is  given  by  Fn  —  (1/P)  Jjf  f{t)  exp_tna;o£  dty  where  i  denotes  the  complex 
number  y/—l,  P  denotes  the  duration  of  a  period  and  co0  is  2n/P. 

Consider  the  one-dimension  discrete  case  in  which  N  samples  /(0),  /(T),*  •  • , 
/(AT)  are  taken  at  the  sampling  rate  T.  The  sampled  sequence  may  not  have 
a  period,  but  in  the  DFT  it  is  assumed  that  these  N  samples  constitute  a 

3The  constant  ojq  is  needed  to  assure  the  orthogonality  between  two  wave  functions. 


2 


period.  As  a  result,  the  period  of  the  sampled  sequence  becomes  NT  and 
correspondingly,  the  constant  frequency  uo  is  27 t/JVT.  The  discrete  Fourier 
transform  is  obtained  by  substituting  respectively  co0  with  2tt/NT,  t  with  kT , 
dt  with  T,  P  with  NT  and  n  with  u  in  the  exponential  Fourier  series,  i.e., 

N  —  l 

F(u )  =  —  /(JfcTJexp-*2**^  0  <u<N 


where  u  is  the  index  in  the  frequency  domain.  Here,  the  total  number  of  fre¬ 
quency  components  is  also  N.  The  lowest  frequency  component  of  the  DFT 
occurs  at  u  =  0  and  is  0.  The  highest  frequency  component  can  be  determined 
from  the  Nyquist  sampling  theorem  and  its  value  is  ^  Hz  (or  cycles/second). 
The  index  u  which  corresponds  to  the  highest  frequency  component  is  —,  right 
at  the  middle  of  the  N  frequency  indices.4  For  the  digital  pictorial  domain, 
the  sampling  interval  T  is  measured  in  terms  of,  not  time,  but  pixels  between 
consecutive  samplings.  In  the  case  of  one  pixel  per  sampling,  i.e.,  T  —  1,  the 
highest  frequency  component  becomes  |  cycles/pixel.5  The  highest  frequency 
(or,  the  bandwidth)  has  been  used  in  computing  the  lower  bound  of  the  hiding 
capacity  of  a  stego  image,  where  the  lower  bound  is  computed  from  the  Shan¬ 
non’s  capacity  measure  of  an  additive  white  Gaussian  noise  (AWGN)  channel6 
with  the  embedded  message  being  viewed  as  the  signal  and  the  cover  message 
as  the  noise,  (e.g.,  [18][13]). 

Let  denote  the  brightness  value  of  the  pixel  at  position  (i,j)  of  an 

image.  The  2D  DFT  is  a  natural  extension  of  the  ID  DFT  by  applying  the  ID 
DFT  to  a  2D  matrix  twice,  and  its  period  is  given  by  the  dimension  of  the  input 
image  (i.e.,  N  x  M),  i.e., 


1  N- 1  M— 1 

F(u, v)  =  —  T  7(M)ex p-*2*(tf+tf)  ,0  <  u  <  N;  0  <  v  <  M  .  (1) 

A:=0  l- 0 

Its  backward  transform7  is  given  by 

4 Recall  that  the  effect  of  sampling  at  time  interval  T  in  time  domain  yields  a  series  of  repli¬ 
cas  of  the  frequency  spectral  separated  at  (2tt)/T  apiece  in  the  frequency  domain.  The  Nyquist 
sampling  theorem  says  the  maximum  sampling  interval  T  is  reciprocally  lower-bounded  by  the 
frequency  bandwidth  W,  i.e.,  (2 n)/T  >  2 W.  Let  Umax  denote  the  highest  frequency  index. 
We  have  (27 r)/T  =  2(timax^o)  or  2(umax)(27r/jVT).  Thus,  umax  is  equal  to  The  highest 
frequency  (i.e.,  umaxu o)  is  ^  radians/second  or  ^  cycles/second. 

5 For  a  digital  image,  the  highest  frequency  of  one  direction  may  differ  from  that  of  the 
other  direction.  Here,  the  two  highest  frequency  components  are  assumed  to  be  the  same. 

6 

C  =  Woj2(l  +  ^), 

where  W  is  the  bandwidth,  5  denotes  the  energy  measure  of  the  signal,  and  N  denotes  the 
energy  measure  of  the  noise. 

7More  precisely,  the  backward  transform  should  be  the  inverse  mapping  F~l.  We  use 
I{k,l)  instead  of  F"1(k,l)  for  convenience. 
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N- 1  M- 1 

I(k,  l)  =  J2  Yl  F(u’ v)  expt2,r(^+w)  ,  0  <  k  <  N;  0  <  l  <  M  (2) 

tt — 0  v=0 

EQ(1)  can  also  be  written  in  the  matrix  form, 


(■Om;v  (  *  *  *  U]j  ) 


(3) 


t27r(^r)  ... 


i>)}  and 


where  Vi  and  Uj  denote  {exp  m  ),exp  l2n(  m  \  exp  l2lr(-  M  )}  and 

{exp~t2^^\exp"t27r(~&“),  •  •  *  ,exp”i27r(^v  )},  respectively.  Note  that  the 
DFT  obeys  the  property  of  symmetry  i.e,  F(u,v)  =  F*(N  —  u,N  —  v)s ,  which 
can  be  seen  by  replacing  u  and  v  with  N  —  u  and  N  -  v  in  exp”t27r^+^).  The 
property  of  symmetry  is  useful  for  plotting  the  result  of  the  DFT  as  shown  in 
the  next  section.  The  2D  DFT  is  a  common  instrument  for  analyzing  hiding 
capacity  and  is  presently  available  in  our  xv  tool. 


2.2  Discrete  Cosine  Transform:  DCT 

The  DCT  had  been  the  major  mathematical  framework  for  image  compression 
in  JPEG  until  JPEG2000  was  introduced.  The  DCT  improves  the  DFT  by  elim¬ 
inating  the  high  frequency  components  induced  by  the  sharp  discontinuities  at 
the  boundary  between  two  consecutive  periods  in  the  time  (or  spatial)  domain 
of  a  periodic  signal.  To  represent  the  sharp  value  change,  it  needs  non-zero 
high  frequency  DFT  coefficients.  If  for  compression  reasons  all  high  frequency 
components  of  DFT,  including  those  generated  from  the  sharp  discontinuities, 
are  deleted,  such  deletion  will  cause  distortion  to  the  original  image.  To  elim¬ 
inate  those  undesirable  high  frequency  components,  the  DCT  concatenates  a 
period  with  the  mirrored  image  of  its  an  adjacent  period.  This  new  period  has 
twice  the  sample  points,  but  no  sharp  value  change  at  the  boundary  with  its 
neighbors.  Concatenation  of  one  period  and  the  mirror  image  of  adjacent  pe¬ 
riod  defines  an  even  function  and  hence,  results  in  yielding  an  all  real-valued 
transform  code.8 9  This  is  a  big  advantage  in  computation!  The  DCT  can  be 
obtained  from  the  DFT  of  a  mirrored  2N  sample  sequence,  where  the  DCT  is 
the  first  N  sample  points.  The  commonly  used  form  of  the  DCT  was  derived 
from  a  class  of  discrete  Chebyshev  polynomials  [1].  The  derivation  of  the  2D 

8F*(.,.)  is  the  complex  conjugate  of  F(.,.) 

9Suppose  a  function,  g(t ),  whose  domain  is  interval  [0,P),  is  concatenated  with  its  shifted 
mirror  image,  g(2P  —  t).  The  Fourier  transform  of  this  concatenated  function  is  given  by 
(1/2F)  f2P(g(t)  +  g(2P  —  t))  exp-tnu'°t  dt,  where  exp(”)mu,ot  =  cos(nu)ot)  +  (— )sin(nuJot)>  It 

can  be  rewritten  as  (1/2P)  ^ fP  g(t )  exp"tnu;ot  dt  +  J^P  g(2P  -  t)  exp-mu7°*  dt'j  .  By  replac¬ 
ing  2P-t  with  t  in  the  second  term,  the  Fourier  transform  becomes  (1/P)  fP  g(t)cos(nu;ot)dt. 
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DCT  code  is  similar  to  that  of  the  DFT.  The  DCT  code  of  an  image  brightness 
matrix  7(7,  j)  (0<i<N,0<j<  M)  is  given  by 


or  \  {  \  7Y-  *\  vr(2i  +  l)u  7r(2j  +  l)u 

=  c(u,u)  2^  2^  - 27V - C05~  2 M  ’ 

j=0 


(4) 


where  0  <  w  <  TV  and  0  <  v  <  M,  and  c(u,v)  is  given  by  c(0,0)=yT/7V y/l/M, 
c(0,v)-yi7TVv^7M,  and  c(uy)=7PvP,  ti,t/  > 
0.  For  each  u  and  v ,  different  values  of  cos71^2^1  u  0  <  7  <  TV 

and  0  <  j  <  M,  form  a  NxM  DCT  basis  matrix.  The  DCT  basis  matrices  are 
orthonormal.  Coefficients  produced  from  these  base  matrices  are  uncorrelated 
and  hence  can  be  processed  independently.  The  backward  DCT  is  shown  below. 


iL  cK  v)S(u,  .  (5) 

U=0  V=0  ^ 

In  JPEG,  the  DCT  is  applied  to  each  block  of  8X8  pixels  from  the  input  image, 
with  the  image  being  partitioned  into  a  number  of  blocks  [15]. 


2.3  Discrete  Wavelet  Transform:  DWT 

The  wavelet  transform  (WT)  has  been  adopted  as  the  standard  tool  in  JPEG 
2000  still  image  compression  as  it  produces  a  higher  compression  ratio  than 
the  DCT  does  [4].  Studies  of  image  compression  also  show  that  the  wavelet 
transform  provides  better  frequency  and  time  (spatial)  resolution  than  other 
transform  techniques  do. 

The  DFT  gives  an  excellent  description  of  the  frequency  responses  of  a  signal, 
but  no  information  about  when  (where)  particular  frequency  components  occur 
in  time  (space).  The  Short-time  Fourier  Transform  (STFT)  improves  the  DFT 
by  breaking  the  signal  into  intervals  of  fixed  length  and  applying  the  Fourier 
analysis  to  each  interval.  A  particular  frequency  response  that  occurs  only  at  a 
certain  interval  can  be  captured  with  STFT.  However,  fixed  length  intervals  have 
their  restrictions.  Although  a  short  fixed  length  interval  is  good  for  identifying 
local  variation  in  time  (space),  it  is  inadequate  to  describe  frequency  responses 
whose  cycles  exceed  the  length  of  the  interval.  The  major  changes  from  STFT 
to  WT  are  perhaps  the  selection  of  base  functions  (e.g.,  the  sinusoidal  functions 
in  Fourier  transform)  and  the  windowing  operation.  A  base  function  of  wavelet 
transform  can  be  any  function  with  zero  mean  and  finite  energy  (called  the 
wavelet).10  The  entire  set  of  base  functions  are  mutually  orthonormal  (like 
sinusoidal  bases)  and  generated  from  a  single  base  function  (called  the  mother 
wavelet)  by  scaling  and  translation.  In  WT,  a  base  function  is  locally  applied 

10That  is,  f  ^(t)2dt  <  inf  and  hence,  a  base  function  is  in  vector  space  L2.  Because  of 
the  finite  energy  requirement,  'l'(t)  is  restricted  to  a  narrow  band,  which  gives  the  wavelet  its 
frequency  localization  capability  [16].  A  sine  (cosine)  function  cannot  be  a  base. 
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to  a  particular  area  of  the  signal  at  a  time.  Localization  is  realized  through 
windowing,  where  the  size  of  the  window,  indicating  resolution,  unlike  the  fixed 
interval  used  in  STFT,  is  not  a  constant.  Only  the  base  function  whose  scale 
(or  cycle)  is  compatible  with  the  size  of  the  window  used.  As  a  result,  base 
functions  of  slower  cycles  are  used  under  a  larger  window,  while  base  functions 
of  faster  cycles  are  used  under  a  shorter  window. 

In  the  case  of  data  compression,  the  implementation  of  the  DWT  is  similar 
to  that  of  subband  coding[16],  where  at  each  stage  a  coarse  overall  shape  and 
details  of  the  data  obtained  from  the  previous  stage  are  derived.  Encoding 
in  the  DWT  proceeds  with  decomposition  and  downsampling.  Decomposition 
separates  data  into  frequency  bands  via  high-pass  and  low-pass  filtering.  The 
functions  of  a  high-pass  filter  are  the  WT  base  functions,  while  the  functions 
of  the  low-pass  filter  are  the  complements  of  the  base  functions.  Downsampling 
removes  data  which  is  not  needed  for  future  reconstruction.  Decoding  on  the 
other  hand  involves  up-sampling  to  adjust  dimensionality  and  recombining  data 
from  different  bands. 

Call  the  output  from  high-pass  and  low-pass  filtering  the  filtered  transform 
coefficients.  Let  ft,  /  and  0  denote  the  high-pass,  low-pass  and  the  convolu¬ 
tion  operation,  respectively.  Consider  the  case  where  the  low-pass  filter  is  a 
2-tap  averaging  operator  (i.e,  l(0)=l/2,  l(l)=l/2)  and  the  high-pass  filter  is 
a  difference  operator  (i.e.,  h(0)=l/2,  h(l)=~l/2  -  the  Haar  transform).  Let 
X  =  {aq,  •  •  •  ,x8}t  The  outcomes  of  filtering  are  the  high-filtered  coefficients 
ft  0  X  and  the  low-filtered  coefficients  l  0  X ,  i.e., 

1  T 

l  0  X  =  -  [x7  4-  X0,  Xo  +  Xi ,  •  •  • ,  x5  +  Xe,  x6  +  x7]  (6) 


X 

ft  0  X  —  ~  [^0  *^7  j  *^1  Xq  ,  •  *  *  ,  Xq  X5 ,  X'j  3>6] 

The  original  signal  can  be  reconstructed  from  those  high-filtered  and  low- 
filtered  coefficients  by,  for  instance,  adding  them  one  by  one  and  dividing  the 
result  of  addition  by  2.  In  fact,  it  can  be  shown  that  reconstruction  needs  just 
half  the  number  of  coefficients  from  each  set  and  hence,  each  of  the  two  sets  is 
down-sampled  to  a  half.  If  downsampling  D  is  picking  up  every  other  coefficient 
from  l  0  X  and  ft  0  X,  it  has  the  form 

“  1  0  0  0  0  0  0  0  “ 

0  0  1  0  0  0  0  0  (R) 

0  0  0  0  1  0  0  0  1  J 

0  0  0  0  0  0  1  0 

The  relationship  between  original  data  and  the  transform  code  is  described  in 
the  matrix  form  as  follows, 


Wa[X] 


DXt 

DXh 


m 


(9) 
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where  Wa  is  the  DWT. 

The  wavelet  transform  may  be  applied  to  each  set  of  filtered  transform  co¬ 
efficients  to  obtain  more  detailed  and  coarser  description.  For  instance,  after 
downsampling,  we  have 
Stage  1: 


1  r 

coarse  :  -  [x7  +  £0,  £1  +  £2,  £3+^4, 

x5  +  x6]T 

(10) 

detail  :  -  [xq  —  £7,  £1  —  £2,  £3  —  £4, 

x$  -  x6f 

(11) 

We  may  continue  thk  process  recursively  to  get  further  decomposition. 
Stage  2: 


Stage  3: 


1  r  -,7* 

coarse  :  -  [x7  4-  x0  +  x\  -I-  £2,  x3  +  £4  +  £5  + 


1  T 
detail  :  -  \x7  +  xq  —  £1  —  £2?  £3  +  £4  —  £5  —  £6] 

4 


1  r 

coarse  :  -  [£7  -F  £0  +  £1  -f  £2  +  ^3  +  £4  +  £5  +  X6\ 

o 


(12) 

(13) 

(14) 


JL  nn 

detail  :  -  [£7  4-  £0  -F  X\  +  £2  —  £3  —  £4  -  £5  —  £6]  (15) 

8 

The  coefficient  matrix  is 

+  +X3+Xi+X',+X<i),  hz7  +X0  +  xi  +x2 -X3- X4-X5 -Xe), 

o  o 

i(x7  +xo-xi-  x2),  i(x3  4-  x4  -  x5  -  x6), 

~(x0-XT),  ^(X!-X2),  |(x3-x4),  ^{X5-X6)]T 

Note  that  the  first  element  of  the  coefficient  matrix  is  the  average  of  all  values. 
For  the  2D  DWT  (i.e.,  WaXWj),  the  transform  codes  of  an  image  are  divided 
into  four  pieces,  often  labeled  as  {LL,  HL,  LH,  HH}.  LL  corresponds  to  the 
coefficients  resulting  from  twice  low-pass  filtering  and  carries  the  most  important 
information  from  the  original  image.  Its  size  is  just  one  quarter  of  the  image. 
The  remaining  three  pieces  are  the  detailed  components.  Similar  to  the  example 
shown  above,  for  better  compression  result,  the  high  and  low  filters  are  applied 
to  the  four  (usually,  just  the  LL)  pieces. 
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3  DISCUSSION 


In  this  section,  we  show  our  experimental  results  with  transform  embedding, 
and  discuss  two  cases  related  to  robustness  and  detection  of  embedded  messages. 
Embedding  is  based  on  the  following  steps:  (1)  Apply  the  transform  algorithm  to 
the  cover  and  the  embedded  data,  (2)  select  the  embedding  method  to  combine 
the  two  sets  of  coefficients,  and  (3)  apply  the  inverse  transform  to  the  combined 
coefficients  to  produce  the  stego  image.  In  watermarking,  extraction  of  the 
embedded  message  usually  involves  the  subtraction  of  the  coefficients  of  the 
cover  from  the  coefficients  of  the  stego,  whereas  in  steganography,  extraction 
may  involve  the  use  of  the  pre-assigned  stego  key. 

3.1  Experimental  Results 

To  illustrate  transform  domain  hiding,  we  embed  an  image  (Waterdrop)  under 
a  cover  image  (Washington  Monument),  where  the  two  images  are  of  the  same 
size.  Let  Fe  and  Fc  denote  the  transform  code  of  the  embedded  and  the  cover 
images,  respectively.  (Note  that,  the  embedded  messages  may  not  be  trans¬ 
formed.)  The  embedding  formula  is  in  general  described  as 

Fs(u,v)  =  Fc(u,v)  -b  J(u,v )  *  Fe{u,v)\  0  <  u  <  M,  0  <  v  <  N 

where  J(u,u)  denotes  the  perceptual  factor  calculated  for  each  frequency  com¬ 
ponent  [19].  In  its  simplistic  form  the  J(u ,  v)  can  be  either  additive  (e.g.,  J(u ,  v) 
=  a),  where  a  is  an  attenuation  factor  for  adjusting  the  magnitude  of  embedded 
coefficients  and  Fs  =  a*Fe  +FC,  or  multiplicative  (e.g.,  a*Fc(u,u)),  where  the 
coefficient  of  the  cover,  Fc(u,v ),  is  involved,  and  Fs  =  Fc  *  (1  +  a  *  Fe).  The 
advantage  of  embedding  in  the  additive  form  is  its  efficient  invertibility  [5]  for 
extraction.  Not  all  coefficients  of  the  cover  are  used  for  embedding.  Transform 
coefficients  of  low  frequency  components  that  contain  the  most  important  over¬ 
all  information  of  the  original  image  usually  are  excluded  from  being  used  for 
embedding.  For  instance,  in  [2],  coefficients  from  the  middle  frequency  (DWT) 
bands  are  randomly  selected  for  embedding.  In  our  current  experiments,  we  set 
J(u,u)  to  1  and  linearly  combined  the  two  sets  of  coefficients,  i.e., 

Fs  =  a  *  Fe  +  (1  -  a)  *  Fc, 

in  order  to  ensure  that  pixel  values  obtained  from  the  inverse  transformation  will 
be  in  the  proper  dynamic  range.  (The  scaling  factor  is  chosen  to  be  a  =  0.05.) 
Since  addition  in  the  Fourier  domain  results  in  addition  in  the  time  (spatial) 
domain,  linear  combination  assures  that  image  values  extracted  from  Fs  will  not 
fall  outside  the  allowed  range.  (On  the  other  hand,  linear  combination  does  not 
make  the  most  use  of  the  transform  domain,  since  embedding  in  one  is  basically 
equivalent  to  embedding  in  another.) 

The  results  of  our  experiments  are  shown  in  Figure  1  to  Figure  6.  Comparing 
the  original  (Figure  1)  and  the  stego  (Figure  5),  perceptually  the  two  show  no 
difference.  The  companion  figures  to  the  images  are  their  corresponding  DFT 
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matrices.  Note  that  the  coefficient  at  the  left  corner  of  a  DFT  matrix  obtained 
from  (1)  should  be  the  lowest  frequency  component  (he.,  u  =  0,  v  —  0  or  the 
DC).  However,  because  of  the  symmetric  property  of  the  DFT,  it  is  custom¬ 
ary  to  display  the  DC  component  at  the  center,  and  the  further  away  from  the 
center  a  DFT  component  is,  the  higher  is  its  corresponding  frequency.  In  our 
present  display,  the  frequency  component  at  (u,  v)  is  moved  to  a  new  position  by 

((M/2  -  1)  -  ti,  (N/2  -l)-v)  if  0  <  u  <  (M/2);  0  <  v  <  (N/2) 
((3M/2  -  1)  -  ti,  (N/2  -  1)  -  v)  if  (M/2)  <  u  <  M;  0<v<  (N/2) 
((M/2  -  1)  -  u,  (SN/2  -l)-v)  if  0  <  u  <  M;  (N/2)  <  v  <  N 
((3M/2  -  1)  -  u,  (37V/2  -  1)  -  v)  if  (M/2)  <  u  <  M;  (N/2)  <v<N 

To  further  enhance  the  DFT  display,  a  logarithmic  transform  is  applied  to  ad¬ 
just  the  dynamic  range  of  coefficients  and  the  result  is  normalized  to  be  within 
the  level  of  0  to  255  (in  order  for  our  xv  tool  to  display).  Since  the  magnitude 
of  the  DC  component  is  far  larger  than  that  of  any  other  frequency  component, 
the  DC  component  is  actually  removed  from  the  DFT  image  (seen  as  a  black 
dot  at  the  center). 


Fig 3.  the  embedded 


Fia2.  DFT  of  the  covev 


Fig4.  DFT  of  the  embedded 
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Fig5.  the  stego  Fig6.  DFT  of  the  stego 


At  present,  we  have  not  yet  implemented  adaptive  selection  of  transform  co¬ 
efficients.  We  do  not  suggest  embedding  spatial  data  (i.e.,  pixels)  of  the  em¬ 
bedded  image  under  the  frequency  coefficients  of  the  cover  (i.e.,  Ie  +  Tc)  due 
to  the  fact  that  the  frequency  coefficients  usually  have  a  much  larger  dynamic 
range.  Hence,  changes  to  the  frequency  components  (due  to  rounding  and  in¬ 
verse  transformation)  can  cause  irremediable  distortion  to  the  embedded  spatial 
data. 

Extraction  is  implemented  by  reversing  the  embedding  steps,  i.e.,  (Fs  —  (1  — 
a)Fc)/a  =  Fg,  where  '  indicates  the  change  of  values  due  to  image  processing 
attacks.  The  embedded  image  extracted  from  the  stego  (Figure  7)  also  appears 
to  be  nearly  identical  to  the  original  Waterdrop  image.  However,  the  signifi¬ 
cant  reduction  in  magnitude  of  frequency  coefficients  during  embedding  taxes 
the  quality  when  image  compression  is  in  order.  On  the  right-handed  side  of 
Figure  7  is  another  extracted  image  (Figure  8)  obtained  from  applying  JPEG 
to  the  stego  image.  The  grossly  smeared  image  shows  the  need  of  more  robust 
embedding. 


Fig7.  the  extracted  embedded  image  FigS .  JPEG  ( Quality  75%)n 
For  comparison,  Figures  9  &  10  show  the  extracted  images  in  case  the  least  2 
significant  bits  from  the  spatial  domain  are  used  for  embedding  [10]. 

11  The  quality  value  is  expressed  on  the  0..100  scale  recommended  by  Independent  JPEG 
Group.  It  is  related  to  the  DCT  quantization. 
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Fig9.  L2SB  embedding  ( Quality  100%)  FiglO.  L2SB  ( Quality  75%) 


The  outcome  supports  our  observation  that  LSB  embedding  is  susceptible  to 
image  processing  attacks. 

The  result  of  embedding  with  the  DWT  is  similar  to  that  of  the  DFT  and 
is  shown  in  Figures  11&12.  The  DWT  does  not  provide  better  robustness;  ro¬ 
bustness  is  not  a  property  of  transform  algorithms. 


3.2  Detection 

For  embedded  data  to  be  undetectable,  it  needs  to  be  transparent  in  both  the 
spatial  and  the  transform  domains.  Manjunath  et  al.  [2]  proposed  the  method 
of  embedding  under  the  DWT  coefficients,  where  only  the  coefficients  in  the 
middle  frequency  range  are  used.  That  is,  in  Figure  11,  embedding  involves 
all  frequency  bands  except  the  area  of  the  left  upper  corner  (corresponding  to 
lower  frequency  bands)  and  the  right  lower  corner  (corresponding  to  the  higher 
frequency  band).  The  cover  and  the  stego  images  are  shown  in  Figures  13&14 
where  both  images  were  taken  from  a  publicly  available  web  site  [12].  The  two 
show  no  visual  significant  difference.  At  least,  they  both  look  legitimate.  How- 
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ever,  visual  transparency  in  the  spatial  domain  does  not  imply  UN-detectability. 
In  fact,  we  can  effectively  show  that  embedded  information  exists  in  Figure  14. 
Our  detection  method  is  based  on  frequency  domain  analysis.  We  applied  the 
DFT  to  both  the  cover  and  the  stego  images  of  Figure  13&14  (only  on  the  Red 
color  byte).  Their  DFT  matrices  are  shown  in  Figure  15&16,  where  to  highlight 
the  contrast,  only  the  most  significant  bit  is  used  in  the  display.  The  image 
with  embedded  data  shows  a  striking  bright  diamond  pattern  that  surrounds 
the  center,  while  the  cover  image  (Figure  13)  with  no  embedding  has  a  com¬ 
mon  radial  shape.  Recall  that  in  the  DFT  display  frequency  components  that 
correspond  to  the  highest  frequency  are  located  in  the  corner  areas,  those  corre¬ 
sponding  to  the  lowest  frequency  are  in  the  center,  and  coefficients  on  the  band 
of  the  diamond  belong  to  the  middle  frequency  range.  The  diamond  pattern  is 
also  seen  in  several  stego  images  we  have  tested.12  As  a  result,  this  seemingly 
transparent  embedding  method  fails  our  simple  detection  test.  The  embedding 
technique  proposed  in  [2]  is  valuable  if  the  stego  image  of  Figure  14  is  for  wa¬ 
termarking,  but  not  steganography.  Note,  watermarking  was  the  intention  of  [2]. 


Fig  15.  DFT  of  cover  Fig  16.  DFT  of  stego 


12The  diamond  shape  in  some  images  are  not  so  clear. 
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So  far  the  strongest  result  on  detection  was  perhaps  made  by  J.  Fridrich 
at  the  IHW2001,  where  she  claimed  that  her  method  can  potentially  detect 
messages  as  short  as  any  single  bit  change  in  a  JPEG  image.13  Her  method 
examines  whether  or  not  a  8x8  block  of  JPEG  pixels  could  have  been  produced 
by  any  block  of  quantized  DCT  coefficients  (also  in  [6]).  This  result  is  interesting 
because  JPEG  is  frequently  used.  We  are  currently  analyzing  their  approach 
and  studying  its  applicability  to  other  transform  methods. 

3.3  Robustness 

To  improve  robustness,  it  may  be  necessary  to  reduce  the  size  of  embedded  data 
and  embed  it  multiple  times  under  different  parts  of  selected  coefficients,  where 
each  embedding  responds  to  a  particular  attack  in  a  different  way.  Interesting 
work  in  robustness  was  recently  reported  by  [11]  (called  cocktail)  and  it  is  one 
of  few  methods  claimed  to  be  very  robust  against  variety  of  attacks.  The  basic 
observation  in  [11]  is  that  most  attacks  will  cause  magnitudes  of  more  than  50% 
of  frequency  coefficients  to  either  increase  or  decrease.  Thus,  it  makes  sense  to 
embed  the  data  twice  with  one  embedding  handling  the  increase  and  the  other 
embedding  handling  the  decrease.  As  a  result,  one  embedding  is  expected  to 
survive  with  higher  chances  against  any  attack. 

Can  the  cocktail  embedding  method  be  applied  to  improve  our  present  NRL 
L2SB[14]  embedding?  Unfortunately,  it  cannot.  Recall  that  NRL  L2SB  embeds 
a  piece  of  datum  under  the  least  2  significant  bits  (so  its  dynamic  range  is  from 
0  to  3)  of  a  pixel  whose  position  is  specified  by  a  pre-assigned  stego  key.  The 
stego  key  is  basically  a  long-term  key  and  independent  of  cover  images.  The 
cocktail  method  was  designed  for  watermarking,  while  NRL  L2SB  was  used 
to  demonstrate  the  concept  of  steganography.  NRL  L2SB  is  used  to  extract 
the  embedded  message,  not  just  to  verify  its  existence  as  many  watermarking 
methods  do. 

We  have  not  yet  found  a  sound  method  that  ensures  the  robustness  of  NRL 
L2SB.  In  the  following,  we  show  a  simplistic  schemes  that  may  be  useful  to 
protect  the  embedded  data  against  a  2x2  low-pass  averaging  filtering  (e.g., 

1/4  1/4  ^  anc*  a  high-pass  difference  filtering  (e.g.,  ) 

attacks.  Assume  position  (i,j)  of  the  cover  image  I  is  chosen  for  embedding. 
Consider  the  following  two  cases. 

Case  1:  average  filtering.  In  the  case  of  averaging  filtering,  we  also  embed 

13The  JPEG  image  generation  involves  the  following  steps.  For  a  given  input  image  (I), 

•  divide  the  I  into  a  number  of  8x8  blocks, 

•  compute  the  DCT  of  each  block  to  yield  the  DCT  coefficient  matrix, 

•  quantize  the  DCT  coefficients, 

•  evaluate  the  inverse  DCT  of  the  quantized  coefficient  matrix,  and 

•  round  the  values  to  obtain  the  final  JPEG  image. 
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the  same  datum  under  the  three  neighbors  (i-l,j-l),  (i-l,j)  and  (i,j-l).  The  2x2 


averaging  filtering  computes  Lp± Ldhi  an(j  stores  the 

result  back  to  position  (i,j),  where  Js(., .)  denotes  the  pixel  value  of  the  stego  at 
As  a  result,  the  embedded  value  at  position  (i,j)  is  preserved  under  this 
scheme.  Note  that  any  pixel  value  Is(k,l)  in  the  [0,255]  range  can  be  represented 
as  the  summation  of  a  multiple  of  4  and  a  remainder,  i.e., 


Is(k,l)  =  4m  +  r  , 

where  m  is  a  value  in  [0,63]  and  r  is  in  [0,3].  If  the  position  (k,l)  is  selected  for 
embedding,  then  r  denotes  the  value  of  the  embedded  datum.  Since  each  of  the 
four  neighbor  pixels  has  the  same  r,  the  result  of  averaging  the  four  pixel  values 
will  still  have  the  form,  4m/  +  r,  with  the  same  r  and  some  number  m!  €  {0, 63]. 


Case  2:  difference  filtering.  The  difference  filtering,  which  calculates 

Is(i  -  U  -  1)  +  ~  Is(i  ~  1,  j)  -  Is(iJ  ~  1) 

4 

is  more  involved.  In  order  to  preserve  the  embedded  value,  we  store  an  embedded 
value  under  not  one,  but  two  positions.  Suppose  the  embedded  value  is  a  “2”, 
which  occupies  the  last  two  significant  bits  as  1  and  0  in  order  from  the  higher 
bit  to  the  lower  bit.  We  embed  the  1  and  the  0  in  separate  positions. 

For  “1”  embedding,  we  embed  the  value  1  under  the  pixel  at  (i,j),  0  at  (i-l,j), 
0  at  (ij-l)  and  3  under  (i-1  j-1). 

For  “0”  embedding,  we  embed  the  value  0  at  all  four  neighbor  pixels  which 
have  no  overlapping  with  those  used  for  “1”  embedding. 

This  scheme  will  get  the  “1”  (or  “0”)  back  at  position  (i,j).  To  extract,  two 
consecutive  positions  are  decoded  together. 

The  length  of  the  stego  key  under  this  embedding  scheme  will  increase  signif¬ 
icantly.  The  length  for  embedding  against  the  average  filtering  becomes  4  times 
its  original  length  and  the  length  for  the  case  of  difference  filtering  becomes  8 
times.  Total  length  is  12  times  of  the  original  one.  We  divide  the  cover  image 
into  two  parts  at  the  ratio  1:2  with  the  smaller  part  for  embedding  against  the 
average  filtering  attack  and  the  larger  one  for  the  case  of  difference  filtering. 
The  elongated  stego  key  will  inevitably  increase  the  detectability  of  embedded 
messages.  We  are  investigating  more  general  robust  embedding  schemes  for 
steganography.  Since  in  steganography  the  cover  image  is  usually  not  available 
for  extraction,  robust  embedding  is  a  more  challenging  issue  to  steganography 
than  to  watermarking. 


4  FUTURE  WORK 

Part  of  our  future  research  will  be  on  the  issues  of  robustness  and  detectability 
of  information  hiding.  We  showed  that  a  watermarked  image  which  is  perceptu¬ 
ally  invisible  in  the  spatial  domain  may  fail  our  detectability  test.  Our  approach 
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to  detectability  is  based  on  the  DFT  domain  analysis.  We  proposed  a  method 
for  protecting  data  embedded  under  LSBs  against  two  specific  forms  of  filtering. 
The  two  methods  need  to  be  refined  and  expanded  for  more  general  applications. 
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